REMARKS 



The application is believed to be in condition for allowance because the 
claims are novel and non-obvious over the cited art. The following paragraphs 
provide the justification for these beliefs. In view of the following reasoning for 
allowance, the applicants hereby respectfully request further examination and 
reconsideration of the subject application. 

The 35 USC 103 Rejection of 1, 2, 9-12, 14-17, 19-21 and 55-61. 

Claims 1, 2, 9-12, 14-17, 19-21 and 55-61 were rejected under 35 USC 103(a) 
as being unpatentable over Konopka et al, U.S. Patent No. 5,850,250, herein after 
referred to as Konopka, in view of Tai et al. (U.S. Patent No. 6,577,333) (herein after 
referred to as Tai) and in further view of Taylor, U.S. Patent No. 7,1 13,201 (herein after 
Taylor). The Office Action stated that Konopka and Taylor teach the applicants' 
claimed invention, but do not teach the applicants' claimed virtual director or that the 
server also captures the sub-events in addition to broadcasting the captured sub- 
events. However, the Examiner further contended that Taylor teaches this feature, 
rendering the applicants' claimed invention obvious. The applicants respectfully 
traverse this contention of obviousness. 

In order to deem the applicants' claimed invention unpatentable under 35 USC 
103, a prima facie showing of obviousness must be made. To make a prima facie 
showing of obviousness, all of the claimed elements of an applicant's invention must be 
considered, especially when they are missing from the prior art. If a claimed element is 
not taught in the prior art and has advantages not appreciated by the prior art, then no 
prima facie case of obviousness exists. The Federal Circuit court has stated that it was 
error not to distinguish claims over a combination of prior art references where a 
material limitation in the claimed system and its purpose was not taught therein (In Re 
Fine, 837 F.2d 107, 5 USPQ2d 1596 (Fed. Cir. 1988)). 

The applicants claim... 



"An automated system for capturing and viewing an event having event 
participants, comprising: 

multiple cameras of different types simultaneously capturing images of 
sub-events occurring in a space associated with an event, wherein the 
multiple cameras of different types are atleast two of: 

a 360-degree camera centrally positioned to monitor in substantially 
360-degrees the space in which the event occurs; 

a remote view camera positioned so as to capture a view of event 
participants in said space associated with said event to be transmitted to a 
client over said network; 

a presenter view camera positioned so as to capture a view of an 
overview of the space associated with the event wherein a presenter would 
typically be presenting; and 

a whiteboard capture camera positioned so as to capture strokes 
written on a whiteboard; 

a virtual director that automatically determines which view of said 
multiple cameras of different types to display, wherein said virtual 
director determines which camera view to display by: 

determining if a person is speaking and facing toward a 
display that displays at least one remote event participant, and if so 
using a camera view captured by said remote camera to display; 

determining if a person is talking and the presenter view 
camera can track them and provide a higher resolution image than the 
360-degree camera, and if so using a camera view captured by said 
presenter view camera for display; and 

else, using a camera view captured by said 360-degree 
camera to display; 

a server capable of recording and broadcasting the captured sub- 
events; and 

one or more clients in network Connection with said server that view 
portions of the captured event." (emphasis added) 

And, 

A system for conducting a distributed meeting, the system comprising: 
a 360-degree camera for capturing images of meeting participants in a 

meeting in substantially 360 degrees about said 360-degree camera; 

a whiteboard camera for capturing images of contents written on a 

whiteboard; 

a presenter camera for capturing images of an overview of the meeting 
room in the area where a presenter would typically be presenting; 

a microphone array for capturing the audio of the meeting that is 
synchronized with one of said images captured by said 360-degree camera, 
whiteboard camera or presenter camera; and 

a virtual director that automatically determines which view of said 360 
degree camera, whiteboard camera or presenter camera to display and 
switches to the determined view of the associated camera to display a view of 
one of said different sub-events, wherein said virtual director determines 
which camera view to display by: 
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determining if a person is talking and the presenter view camera 

can track them and provide a higher resolution image than the 360-deqree 
camera, and if so using a camera view captured by said presenter view 
camera for display; and 

else, using a camera view captured by said 360-degree camera 

to display ; and 

a meeting server for performing processing reguired to broadcast and 
record meeting data." (emphasis added) 



"An automated system for capturing and viewing an event having event 
participants, comprising: 

multiple cameras of different types simultaneously capturing images of 
different sub-events occurring in a space associated with an event; 

an event server, that processes in substantially real time said event 

data; 

an event post processor that process said event data only when the event is 
completed; 

a virtual director that automatically determines which of said multiple 
cameras of different types to display based on the position of a person 
speaking and the ability to track a person speaking in the captured images 
and audio signals received and switches between said multiple cameras of 
different types to display a view of one of said different sub-events; and 

at least one event client in connection with said event server wherein 
said event client allows viewing live events and archived events." (emphasis 
added) 

And, 

"A computer-readable medium having computer-executable 
instructions for viewing a recorded event, said computer-executable 
instructions comprising: 

simultaneously capturing images of different sub-events by of an event 
with multiple cameras of different types each capturing a different sub-event; 

capturing audio associated with the different sub-events; 

automatically selecting which of the captured sub-events to transmit 
based based on the position of a person speaking and the ability to track a 
person speaking in the captured images of the different sub-events and the 
captured audio associated with the different sub-events ; and 

transmitting the selected captured sub-events and associated audio 
from a server to one or more clients in network connection with said server." 
(emphasis added) 

And, 

"A system for conducting a distributed meeting, the system comprising: 
a 360-degree camera for capturing images of meeting participants in a 
meeting room in substantially 360 degrees about said 360-degree camera, 
wherein said 360-degree camera includes an integrated computer that 
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performs processing required to broadcast said images and associated 
meeting data; and 

a whiteboard camera for capturing images of contents written on a 
whiteboard; 

a presenter view camera for capturing images of an overview of the 
meeting room in a space where a presenter would typically be presenting; 
and 

a virtual director that automatically determines which view of said 360 
degree camera, whiteboard view camera and presenter view camera to 
displa y based on determining if a person is speaking and is positioned in a 
certain manner relative to one of the cameras and the ability to track the 
person speaking ." (emphasis added) 

In contrast, Konopka discloses a video distance learning system including a 
teaching classroom connected to remote learning classrooms by a fiber-optic 
communication network. The teaching classroom includes a rear audio/video 
cabinet housing four video monitors and a camera. The remote classrooms have 
front cabinets with four monitors and a camera. In a normal operating mode, one of 
the video monitors will display the teacher, while the other three monitors display 
classroom images. A rear video camera mounted is focused on the teacher and a 
front video camera may be focused on the students. The front video cabinet may 
have a graphics or document camera is also provided on the front video cabinet. 
The document camera points downward at a light table to image materials such as 
books, pictures and overhead transparencies. The teacher may switch between the 
rear camera, the front camera and the document camera. A teacher's work station, 
may be located at the front of the teaching classroom. A control panel allows the 
teacher to control all devices located within the room, such as volume, displays, or 
focus. The work station may also include a personal computer interfacing with the 
network to schedule classes. The video distance learning system facilitates eye 
contact between the teacher in a teaching classroom and students in remote 
classrooms. Konopka does not, however, teach the applicants' claimed virtual 
director that automatically determines which view of the multiple cameras of 
different types to display based on the positioning of a person speaking and 
the ability to track the speaker in images captured bv the cameras and the 
associated audio of the person speaking, and automatically switches between 
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the multiple cameras of different types to display a view of one of the different 
sub-events. 

Tai teaches a technique for automatically selecting a video output from 
among several video input sources based strictly on audio signals, not positioning 
of a person speaking in an event. In one method, one or more audio sensors are 
associated with each video input source. Preferably, an audio sensor is positioned 
to receive audio signals from directions that receive favorable coverage in the field 
of view of the associated video source. An autoselector calculates audio scores for 
each of the audio sensors over short (e.g., 0.5 seconds) examination intervals. At 
each examination interval, the potential exists for a different video source to be 
selected as the video output. The autoselector selects a video source based on the 
audio scores for an examination interval, as well as the recent time-history of video 
source selection. For instance, if a new video source has just been selected, 
selection of a different source may be disabled for a few seconds. The time-history 
is also used to increase the probability that source selection varies in a seemingly- 
natural manner. (Abstract) However, Tai does not teach does not, however, teach 
the applicants' claimed virtual director that automatically determines which 
view of the multiple cameras of different types to display based on the 
positioning of a person speaking and the ability to track the speaker in images 
captured by the cameras and the associated audio of the person speaking, 
and automatically switches between the multiple cameras of different types to 
display a view of one of the different sub-events. 

Granted, the Examiner states (with respect to Claim 8 later) that the same 
virtual director as the applicants claim is taught in Tai in FIGs. 2 and 6, Col. 3, line 
26-col. 4, line 8 and Col. 6, line 52-Col. 7, line 17), but these passages make no 
mention of the positioning of a speaker (person speaking) in images captured by the 
cameras, nor do these passages mention the ability to track a speaker in the 
images. 
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Taylor teaches an image processing apparatus where image data from a 
plurality of cameras capture the movements of a number of people, for example in a 
meeting, and sound data from a directional microphone array is processed by a 
computer processing apparatus to archive the data in a meeting archive database. 
The image data captured is processed to determine the three-dimensional position 
and orientation of each person's head and to determine at whom each person is 
looking. The sound data is processed to determine the direction from which the 
sound came. Processing is carried out to determine who is speaking by determining 
which person has his head in a position corresponding to the direction from which 
the sound came. Having determined which person is speaking, the personal speech 
recognition parameters for that person are selected and used to convert the sound 
data to text data. Image data to be archived is chosen by selecting the camera 
which best shows the speaking participant and the participant to whom he is 
speaking. Image data, sound data, text data and data defining at whom each 
person is looking is stored in the meeting archive database. (Abstract) 

Taylor does not, however, teach the applicants' claimed multiple cameras of 
different types simultaneously capturing images of sub-events occurring in a space 
associated with an event. In Taylor all the cameras are of the same type positioned 
so as to determine the positions of the meeting participants. Nor does Taylor teach 
does not, however, teach the applicants' claimed virtual director that automatically 
determines which view of the multiple cameras of different types to display based on 
the positioning of a person speaking and the ability to track the speaker in images 
captured by the cameras and the associated audio of the person speaking, and 
automatically switches between the multiple cameras of different types to display a 
view of one of the different sub-events. In Taylor only a far view of the speaking 
meeting participant and to whom they are speaking is recorded. Most of the people 
speaking will be captured from behind as is evidenced from the positions of the 
cameras relative to the majority of the meeting partipants (see FIG. 1 ). No close up 
frontal views of a speaker can be displayed; no views specifically optimized to be 
transmitted to a remote participant can be displayed; and no whiteboard camera 
views can be displayed. 
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Since neither Konopka nor Tai nor Taylor teaches the applicants' claimed 
does not, however, teach the applicants' claimed virtual director that automatically 
determines which view of the multiple cameras of different types to display based on 
the positioning of a person speaking and the ability to track the speaker in images 
captured by the cameras and the associated audio of the person speaking, and 
automatically switches between the multiple cameras of different types to display a 
view of one of the different sub-events, the combination does not teach it. Thus, the 
applicants have claimed elements not taught in the cited art and which have 
advantages not recognized therein. Accordingly, no prima facie case of 
obviousness has been established in accordance with the holding of In Re Fine. 
This lack of prima facie showing of obviousness means that the rejected claims are 
patentable under 35 USC 103 over Konopka in view of Tai and in further view of 
Taylor. It is, therefore, respectfully requested that the rejection of Claims 1 , 2, 9-1 2, 
14-17, 19-21 and 55-61 be reconsidered based on the novel and non-obvious 
exemplary claim language: 



"An automated system for capturing and viewing an event having 
event participants, comprising: 

multiple cameras of different types simultaneously capturing images of 
sub-events occurring in a space associated with an event, wherein the 
multiple cameras of different types are atleast two of: 

a 360-degree camera centrally positioned to monitor in substantially 
360-degrees the space in which the event occurs; 

a remote view camera positioned so as to capture a view of event 
participants in said space associated with said event to be transmitted to a 
client over said network; 

a presenter view camera positioned so as to capture a view of an 
overview of the space associated with the event wherein a presenter would 
typically be presenting; and 

a whiteboard capture camera positioned so as to capture strokes 
written on a whiteboard; 

a virtual director that automatically determines which view of said 
multiple cameras of different types to display, wherein said virtual 
director determines which camera view to display by: 

determining if a person is speaking and facing toward a 
display that displays at least one remote event participant, and if so 
using a camera view captured by said remote camera to display; 
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determining if a person is talking and the presenter view 
camera can track them and provide a higher resolution image than the 
360-degree camera, and if so using a camera view captured by said 
presenter view camera for display; and 

else, using a camera view captured by said 360-degree 
camera to display; 

a server capable of recording and broadcasting the captured sub- 
events; and 

one or more clients in network connection with said server that view 
portions of the captured event." (emphasis added) 



The 35 USC 103 Rejection of 3, 5 and 6. 

Claims 3, 5 and 6 were rejected under 35 USC 103(a) as being unpatentable 
over Konopka in view of Tai, in further view of Taylor and in further view of Ippolito, 
U.S. Patent No. 6,072,522 (herein after Ippolito). The Examiner stated that Konopka 
Tai and Taylor teach the applicants' claimed invention, but do not teach cameras 
placed in a back to back fashion. However, the Examiner further contended that 
Ippolito teaches this feature, rendering the applicants' claimed invention obvious. The 
applicants respectfully disagree with this contention of obviousness. 



As discussed above, the applicants claim... 

"An automated system for capturing and viewing an event having event 
participants, comprising: 

multiple cameras of different types simultaneously capturing images of 
sub-events occurring in a space associated with an event, wherein the 
multiple cameras of different types are atleast two of: 

a 360-degree camera centrally positioned to monitor in substantially 
360-degrees the space in which the event occurs; 

a remote view camera positioned so as to capture a view of event 
participants in said space associated with said event to be transmitted to a 
client over said network; 

a presenter view camera positioned so as to capture a view of an 
overview of the space associated with the event wherein a presenter would 
typically be presenting; and 

a whiteboard capture camera positioned so as to capture strokes 
written on a whiteboard; 

a virtual director that automatically determines which view of said 
multiple cameras of different types to display, wherein said virtual 
director determines which camera view to display by: 
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determining if a person is speaking and facing toward a 
display that displays at least one remote event participant, and if so 
using a camera view captured by said remote camera to display; 

determining if a person is talking and the presenter view 
camera can track them and provide a higher resolution image than the 
360-degree camera, and if so using a camera view captured by said 
presenter view camera for display; and 

else, using a camera view captured by said 360-degree 
camera to display; 

a server capable of recording and broadcasting the captured sub- 
events; and 

one or more clients in network connection with said server that view 
portions of the captured event." (emphasis added) 

Neither Konopka, Taylor, Tai nor Ippolito teach the applicants' claimed 
multiple cameras of different types simultaneously capturing images of sub- 
events occurring in a space associated with an event; or a virtual director that 
determines which view of the multiple cameras of different types to display 
based on the positioning of a speaker and the ability to track the speaker in 
captured images, and switches between the multiple cameras of different 
types to display a view of one of the different sub-events. 



Since neither Konopka, Tai, Taylor nor Ippolito teach the applicants' claimed 
multiple cameras of different types simultaneously capturing images of sub- 
events occurring in a space associated with an event; or a virtual director that 
automatically determines which view of the multiple cameras of different types 
to display based on the positioning of a person speaking and the ability to 
track the speaker in images captured by the cameras, and the associated 
audio of the person speaking, and automatically switches between the 
multiple cameras of different types to display a view of one of the different 
sub-events, the combination does not teach it. Thus, the applicants have claimed 
elements not taught in the cited art and which have advantages not recognized 
therein. Accordingly, no prima facie case of obviousness has been established in 
accordance with the holding of In Re Fine. This lack of prima facie showing of 
obviousness means that the rejected claims are patentable under 35 USC 103 over 
Konopka in view of Tai, Taylor and Ippolito. It is, therefore, respectfully requested 
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that the rejection of Claims 3, 5 and 6 be reconsidered based on the above quoted 
claim language. 



The 35 USC 103 Rejection of 4. 

Claim 4 was rejected under 35 USC 103(a) as being unpatentable over Konopka 
in view of Tai, in further view of Taylor, in further view of Ippolito and in yet further view 
of Liu et al., U.S. Patent No. 6,839,067 (herein after Liu). The Examiner stated that 
Konopka, Tai, Ippolito and Taylor teach the applicants' claimed invention, but do not 
teach a panoramic stitcher for stitching images together. However, the Examiner 
further contended that Liu teaches this feature, rendering the applicants' claimed 
invention obvious. The applicants respectfully disagree with this contention of 
obviousness. 



As discussed above, the applicants claim... 

"An automated system for capturing and viewing an event having event 
participants, comprising: 

multiple cameras of different types simultaneously capturing images of 
sub-events occurring in a space associated with an event, wherein the 
multiple cameras of different types are atleast two of: 

a 360-degree camera centrally positioned to monitor in substantially 
360-degrees the space in which the event occurs; 

a remote view camera positioned so as to capture a view of event 
participants in said space associated with said event to be transmitted to a 
client over said network; 

a presenter view camera positioned so as to capture a view of an 
overview of the space associated with the event wherein a presenter would 
typically be presenting; and 

a whiteboard capture camera positioned so as to capture strokes 
written on a whiteboard; 

a virtual director that automatically determines which view of said 
multiple cameras of different types to display, wherein said virtual 
director determines which camera view to display by: 

determining if a person is speaking and facing toward a 
display that displays at least one remote event participant, and if so 
using a camera view captured by said remote camera to display; 

determining if a person is talking and the presenter view 
camera can track them and provide a higher resolution image than the 
360-degree camera, and if so using a camera view captured by said 
presenter view camera for display; and 
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else, using a camera view captured by said 360-degree 
camera to display; 

a server capable of recording and broadcasting the captured sub- 
events; and 

one or more clients in network connection with said server that view 
portions of the captured event." (emphasis added) 

As discussed above neither Konopka nor Taylor nor Ippolito teach the 
applicants' claimed multiple cameras of different types simultaneously capturing 
images of sub-events occurring in a space associated with an event; or virtual 
director that automatically determines which view of the multiple cameras of 
different types to display based on the positioning of a person speaking and 
the ability to track the speaker in images captured by the cameras and the 
associated audio of the person speaking, and automatically switches between 
the multiple cameras of different types to display a view of one of the different 
sub-events . 

Liu teaches a method and apparatus for providing multi-resolution video to 
multiple users under hybrid human and automatic control. Initial environment and 
close-up images are captured using a first camera and a PTZ camera. The initial 
images are then stored in memory. Current environment and close-up images are 
captured and then an estimated difference between the initial and current images 
and the true image is determined. The estimated differences are weighted and 
compared and the stored images are updated. A close-up image is then provided to 
each user of the system. The close-up camera is then directed to a portion of the 
environment image having high distortion, and current environment and close-up 
images are captured again. (Abstract) However, Liu does not teach the applicants' 
claimed multiple cameras of different types simultaneously capturing images 
of sub-events occurring in a space associated with an event; or virtual director 
that automatically determines which view of the multiple cameras of different 
types to display based on the positioning of a person speaking and the ability 
to track the speaker in images captured by the cameras, and the associated 
audio of the person speaking, and automatically switches between the 
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multiple cameras of different types to display a view of one of the different 



sub-events. 



Since neither Konopka nor Taylor nor Ippolito nor Liu teaches the applicants' 
claimed teach the applicants' claimed multiple cameras of different types 
simultaneously capturing images of sub-events occurring in a space 
associated with an event where virtual director that automatically determines 
which view of the multiple cameras of different types to display based on the 
positioning of a person speaking and the ability to track the speaker in images 
captured by the cameras, and the associated audio of the person speaking, 
and automatically switches between the multiple cameras of different types to 
display a view of one of the different sub-events , the combination does not teach 
it. Thus, the applicants have claimed elements not taught in the cited art and which 
have advantages not recognized therein. Accordingly, no prima facie case of 
obviousness has been established in accordance with the holding of In Re Fine. 
This lack of prima facie showing of obviousness means that the rejected claims are 
patentable under 35 USC 103 over Konopka in view of Tai, Ippolito, Taylor and Liu. 
It is, therefore, respectfully requested that the rejection of Claim 4 be reconsidered 
based on the above-quoted claim language. 

The 35 USC 103 Rejection of 8. 

Claim 8 was rejected under 35 USC 103(a) as being unpatentable over Konopka 
in view of Taylor, in view of Ippolito and in further view of Liu. The Examiner stated that 
Konopka, Ippolito and Taylor teach the applicants' claimed invention, but do not teach 
displaying a higher resolution image of a presenter. However, the Examiner further 
contended that Liu teaches this feature, rendering the applicants' claimed invention 
obvious. The applicants respectfully disagree with this contention of obviousness. 

The limitations of Claim 8 were incorporated into Claim 1 , so this rejection is 
moot. However, with respect to the limitations of Claim 8, as incorporated into Claim 1 , 
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the applicants claim. 



"An automated system for capturing and viewing an event having event 
participants, comprising: 

multiple cameras of different types simultaneously capturing images of 
sub-events occurring in a space associated with an event, wherein the 
multiple cameras of different types are atleast two of: 

a 360-degree camera centrally positioned to monitor in substantially 
360-degrees the space in which the event occurs; 

a remote view camera positioned so as to capture a view of event 
participants in said space associated with said event to be transmitted to a 
client over said network; 

a presenter view camera positioned so as to capture a view of an 
overview of the space associated with the event wherein a presenter would 
typically be presenting; and 

a whiteboard capture camera positioned so as to capture strokes 
written on a whiteboard; 

a virtual director that automatically determines which view of said 
multiple cameras of different types to display, wherein said virtual 
director determines which camera view to display by: 

determining if a person is speaking and facing toward a 
display that displays at least one remote event participant, and if so 
using a camera view captured by said remote camera to display; 

determining if a person is talking and the presenter view 
camera can track them and provide a higher resolution image than the 
360-degree camera, and if so using a camera view captured by said 
presenter view camera for display; and 

else, using a camera view captured by said 360-degree 
camera to display; 

a server capable of recording and broadcasting the captured sub- 
events; and 

one or more clients in network connection with said server that view 
portions of the captured event." (emphasis) 

As discussed above neither Konopka nor Tai nor Taylor nor Ippolito teach the 
applicants' claimed multiple cameras of different types simultaneously 
capturing images of sub-events occurring in a space associated with an event; 
or a virtual director that determines which view of the multiple cameras of 
different types to display based on the positioning of a speaker and the ability 
to track a speaker in captured images, and switches between the multiple 
cameras of different types to display a view of one of the different sub-events. 
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Liu teaches a method and apparatus for providing multi-resolution video to 
multiple users under hybrid human and automatic control. Initial environment and 
close-up images are captured using a first camera and a PTZ camera. The initial 
images are then stored in memory. Current environment and close-up images are 
captured and then an estimated difference between the initial and current images 
and the true image is determined. The estimated differences are weighted and 
compared and the stored images are updated. A close-up image is then provided to 
each user of the system. The close-up camera is then directed to a portion of the 
environment image having high distortion, and current environment and close-up 
images are captured again. (Abstract) However, Liu does not teach the applicants' 
claimed multiple cameras of different types simultaneously capturing images 
of sub-events occurring in a space associated with an event; or virtual director 
that automatically determines which view of the multiple cameras of different 
types to display based on the positioning of a person speaking and the ability 
to track the speaker in images captured by the cameras and the associated 
audio of the person speaking, and automatically switches between the 
multiple cameras of different types to display a view of one of the different 
sub-events . 

Since neither Konopka nor Tai nor Taylor nor Ippolito nor Liu teaches the 
applicants' claimed multiple cameras of different types simultaneously 
capturing images of sub-events occurring in a space associated with an event 
where virtual director that automatically determines which view of the multiple 
cameras of different types to display based on the positioning of a person 
speaking and the ability to track the speaker in images captured by the 
cameras and the associated audio of the person speaking, and automatically 
switches between the multiple cameras of different types to display a view of 
one of the different sub-events , the combination does not teach it. Thus, the 
applicants have claimed elements not taught in the cited art and which have 
advantages not recognized therein. Accordingly, no prima facie case of 
obviousness has been established in accordance with the holding of In Re Fine. 
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This lack of prima facie showing of obviousness means that the rejected claims are 
patentable under 35 USC 103 over Konopka in view of Taj, Ippolito.Taylor and Liu. 
It is, therefore, respectfully requested that the rejection of Claim 8 be reconsidered 
based on the above quoted claim language. 

The 35 USC 103 Rejection of Claim 13. 

Claim 13 was rejected under 35 USC 103(a) as being unpatentable over 
Konopka in view of Tai in view of Taylor and in further view of Rodriguez, Jr. et al., U.S. 
Patent No. 6,179,426 (herein after Rodriguez), The Examiner stated that Konopka, Tai 
and Taylor teach the applicants' claimed invention, but do not teach a projector for 
projecting images on a screen. However, the Examiner further contended that 
Rodriguez teaches this feature, rendering the applicants' claimed invention obvious. 
The applicants respectfully disagree with this contention of obviousness. 

As discussed above neither Konopka, Tai nor Taylor teach the applicants' 
claimed multiple cameras of different types simultaneously capturing images 
of sub^events occurring in a space associated with an event where virtual 
director that automatically determines which view of the multiple cameras of 
different types to display based on the positioning of a person speaking and 
the ability to track the speaker in images captured by the cameras and the 
associated audio of the person speaking, and automatically switches between 
the multiple cameras of different types to display a view of one of the different 
sub-events. Rodrigues also does not teach these claimed features. 

Since neither Konopka, Tai, Taylor nor Rodriguez teaches the applicants' 
claimed multiple cameras of different types simultaneously capturing images 
of sub-events occurring in a space associated with an event; or virtual director 
that automatically determines which view of the multiple cameras of different 
types to display based on the positioning of a person speaking and the ability 
to track the speaker in images captured by the cameras and the associated 
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audio of the person speaking, and automatically switches between the 
multiple cameras of different types to display a view of one of the different 
sub-events, the combination does not teach it. Thus, the applicants have claimed 
elements not taught in the cited art and which have advantages not recognized 
therein. Accordingly, no prima facie case of obviousness has been established in 
accordance with the holding of In Re Fine. This lack of prima facie showing of 
obviousness means that the rejected claims are patentable under 35 USC 103 over 
Konopka in view of Tai, Taylor and Rodriguez . It is, therefore, respectfully 
requested that the rejection of Claim 13 be reconsidered based on the above quoted 
claim language. 



The 35 USC 103 Rejection of Claims 51-54. 



Claims 51-54 were rejected under 35 USC 103(a) as being unpatentable over 
Konopka, in view of Tai, Taylor, Ippolito and in further view of Rodriguez, Jr. eta.l., 
U.S. Patent No. 6,179,426 (herein after Rodriguez). The Examiner stated that 
Konopka, Tai, Taylor and Ippolito teach the applicants' claimed invention, but do not 
teach the same types of cameras, in particular a whiteboard camera. However, the 
Examiner further contended that Rodriguez teaches this feature, rendering the 
applicants' claimed invention obvious. The applicants respectfully disagree with this 
contention of obviousness. 



The applicants claim 

"A system for conducting a distributed meeting, the system 
comprising: 

a 360-degree camera for capturing images of meeting participants in a 
meeting in substantially 360 degrees about said 360-degree camera; 

a whiteboard camera for capturing images of contents written on a 
whiteboard; 

a presenter camera for capturing images of an overview of the meeting 
room in the area where a presenter would typically be presenting; 

a microphone array for capturing the audio of the meeting that is 
synchronized with one of said images captured by said 360-degree camera, 
whiteboard camera or presenter camera; and 

a virtual director that automatically determines which view of said 360 
degree camera, whiteboard camera or presenter camera to display and 
switches to the determined view of the associated camera to display a view of 
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one of said different sub-events , wherein said virtual director determines 
which camera view to display by: 

determining if a person is talking and the presenter view camera 

can track them and provide a higher resolution image than the 360-degree 
camera, and if so using a camera view captured by said presenter view 
camera for display; and 

else, using a camera view captured by said 360-degree camera 

to display ; and 

a meeting server for performing processing required to broadcast and 
record meeting data." 

As discussed above neither Konopka nor Taylor nor Ippolito teach the 
applicants' claimed multiple cameras of different types simultaneously capturing 
images of sub-events occurring in a space associated with an event where virtual 
director that automatically determines which view of the multiple cameras of 
different types to display based on the positioning of a person speaking and 
the ability to track the speaker in images captured by the cameras, and the 
associated audio of the person speaking, and automatically switches between 
the multiple cameras of different types to display a view of one of the different 
sub-events . Rodriguez also does not teach these claimed features. 

Since neither Konopka, Tai, Taylor, Ippolito nor Rodriguez teaches the 
applicants' claimed multiple cameras of different types simultaneously capturing 
images of sub-events o ccurring in a space associated with an event with virtual 
director that automatically determines which view of the multiple cameras of different 
types to display based on the positioning of a person speaking and the ability to 
track the speaker in images captured by the cameras and the associated audio of 
the person speaking, and automatically switches between the multiple cameras of 
different types to display a view of one of the different sub-events , the combination 
does not teach it. Thus, the applicants have claimed elements not taught in the 
cited art and which have advantages not recognized therein. Accordingly, no prima 
facie case of obviousness has been established in accordance with the holding of In 
Re Fine. This lack of prima facie showing of obviousness means that the rejected 
claims are patentable under 35 USC 103 over Konopka in view of Tai, Taylor, 
Ippolito and Rodriguez . It is, therefore, respectfully requested that the rejection of 
Claims 51-54 be reconsidered based on the above quoted claim language. 
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The 35 USC 103 Rejection of Claim 18. 



Claim 18 was rejected under 35 USC 103(a) as being unpatentable over 
Konopka in view of Tai, in view of Taylor and in further view of Tosaya, U.S. Patent No. 
6,549,230 (herein after Tosaya). The Examiner stated that Konopka, Tai and Taylor 
teach the applicants' claimed invention, but do not teach an event kiosk that is located 
on one of multiple cameras. However, the Examiner further contended that Tosaya 
teaches this feature, rendering the applicants' claimed invention obvious. The 
applicants respectfully disagree with this contention of obviousness. 

As discussed above neither Konopka nor Taylor teach the applicants' claimed 
multiple cameras of different types simultaneously capturing images of sub-events 
occurring in a space associated with an event where virtual director that 
automatically determines which view of the multiple cameras of different types to 
display based on the positioning of a person speaking and the ability to track the 
speaker in images captured by the cameras and the associated audio of the person 
speaking, and automatically switches between the multiple cameras of different 
types to display a view of one of the different sub-events . Tosaya also does not 
teach these claimed features. 

Since neither Konopka, Tai, Taylor nor Tosaya teaches the applicants' 
claimed multiple cameras of different types simultaneously capturing images 
of sub-events occurring in a space associated with an event where virtual 
director that automatically determines which view of the multiple cameras of 
different types to display based on the positioning of a person speaking and 
the ability to track the speaker in images captured by the cameras and the 
associated audio of the person speaking, and automatically switches between 
the multiple cameras of different types to display a view of one of the different 
sub-events, the combination does not teach it. Thus, the applicants have claimed 
elements not taught in the cited art and which have advantages not recognized 
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therein. Accordingly, no prima facie case of obviousness has been established in 
accordance with the holding of In Re Fine. This lack of prima facie showing of 
obviousness means that the rejected claims are patentable under 35 USC 103 over 
Konopka in view of Tai, Taylor and Tosaya . It is, therefore, respectfully requested 
that the rejection of Claim 18 be reconsidered based on the above-quoted claim 
language. 

The 35 USC 103 Rejection of Claims 69, 71 and 72 

Claims 69, 71 and 72 were rejected under 35 USC 103(a) as being unpatentable 
over Konopka, in view of Taylor, in view of Ippolito and in view of Rodriguez in further 
view of Tosaya. The Examiner stated that Konopka, Taylor, Ippolito and Rodriguez 
teach the applicants' claimed invention, but do not teach a 360-degree camera that 
includes an integrated computer that performs processing required to broadcast images 
and associated meeting data. However, the Examiner further contended that Tosaya 
teaches this feature, rendering the applicants' claimed invention obvious. The 
applicants respectfully disagree with this contention of obviousness. 

As discussed above neither Konopka nor Taylor nor Ippolito nor Rodriguez 
teach the applicants' claimed multiple cameras of different types simultaneously 
capturing images of sub-events occurring in a space associated with an event 
where virtual director that automatically determines which view of the multiple 
cameras of different types to display based on the positioning of a person 
speaking and the ability to track the speaker in images captured by the 
cameras and the associated audio of the person speaking, and automatically 
switches between the multiple cameras of different types to display a view of 
one of the different sub-events . 

Tosaya teaches a portable video conference module supporting a network- 
based video conference comprising a processor, a video camera, and audio input 
device and several interfaces coupled to the processor. The processor includes a 
local instruction processor accessing a local non-volatile memory. The interfaces 
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include a wireless data capture interface, a video display interface, an audio output 
interface and a network interface. But Tosaya does not teach the applicants' claimed 
multiple cameras of different types simultaneously capturing images of sub- 
events occurring in a space associated with an event where a virtual director 
automatically determines which of the multiple cameras of different types to 
display based on the positioning and tracking of a speaker, and switches 
between the multiple cameras of different types to display a view of one of the 
different sub-events. Tosaya also does not teach these claimed features. 

Since neither Konopka, Tai, Taylor, Ippolito nor Tosaya teaches the 
applicants' claimed multiple cameras of different types simultaneously 
capturing images of sub-events occurring in a space associated with an event 
where a virtual director automatically determines which of the multiple 
cameras of different types to display based on the positioning and tracking of 
a speaker, and switches between the multiple cameras of different types to 
display a view of one of the different sub-events, the combination does not teach 
it. Thus, the applicants have claimed elements not taught in the cited art and which 
have advantages not recognized therein. Accordingly, no prima facie case of 
obviousness has been established in accordance with the holding of In Re Fine. 
This lack of prima facie showing of obviousness means that the rejected claims are 
patentable under 35 USC 103 over Konopka, Tai, Taylor, Ippolito and Rodriguez, in 
view of Tosaya. It is, therefore, respectfully requested that the rejection of Claims 
69, 71 and 72 be reconsidered based on the above-quoted claim language. 



In summary, it is believed that the claims 1-6, 8-21, 51-61 and 69, 71-72 are 
in condition for allowance. Allowance of these claims at an early date is 
courteously solicited. 
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